Implement batched gemm bias permute for RDNA4#3534
Conversation
…rs for gridwise_gemm_wmma_cshuffle_v3, test setup for odd cases
…_bias_permute-for-rdna4
|
Can you also add an example for wmma? |
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Show resolved
Hide resolved
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Outdated
Show resolved
Hide resolved
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Outdated
Show resolved
Hide resolved
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Outdated
Show resolved
Hide resolved
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Show resolved
Hide resolved
...ermute/device_batched_gemm_bias_permute_m2_n3_k1_wmma_c_shuffle_f16_f16_f16_f16_instance.cpp
Outdated
Show resolved
Hide resolved
ApoorvaKalyani
left a comment
There was a problem hiding this comment.
Great work!
I also think we need more instances and we need to reverify the tests for those.
…e code between platforms
…tances to the test
…_bias_permute-for-rdna4
|
@EnricoDeg @ApoorvaKalyani Thank you for the reviews. I processed the comments, added an example and added a couple of instances for both v1 and v3 pipelines. Let me know if there's still something you'd like to see changed. |
LGTM |
…ptors dependent on the transfer method
Great! |
Proposed changes
This MR implements batched gemm bias permute for RDNA3/4. In practice, this is a multidimensional contraction operation. The MR contains the following:
device_batched_contraction_multiple_d_wmma_cshuffle_v3)GridwiseGemmWmmaCShuffleV3to allow passing in non-naive grid descriptorsNote that support for different dimensions and D tensor configurations is very limited at the moment. More scaffolding would be needed to add generic support for variable number of dimensions, but with this limited implementation there is at least parity with the XDL versions.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered